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Abstract 

Calibration refers to the process of adjusting features of a computational model that 
are not observed in the physical process so that the model matches the real process. 

We propose a framework for calibration when the unobserved features, i.e. calibration 
parameters, do not assume a single value, but are functionally dependent on other 
inputs. We demonstrate that this problem is curve to surface matching where the 
matched curve does not possess the same length as the original curve. Therefore, we 
perform non-isometric matching of a curve to a surface. Since in practical applications 
we do not observe a continuous curve but a sample of data points, we use a graph- 
theoretic approach to solve this matching of incomplete data. We define a graph 
structure in which the nodes are selected from the incomplete surface and the weights 
of the edges are decided based on the response values of the curve and surface. We 
show that the problem of non-isometric incomplete curve to surface matching is a 
shortest path problem in a directed acyclic graph. We apply the proposed method, 
graph-theoretic non-isometric matching, to real and synthetic data and demonstrate 
that the proposed method improves the prediction accuracy in functional calibration. 

Keywords: Matching, non-isometry, calibration, computer experiments, graph short¬ 
est path 

1 Introduction 


Calibration refers to the problem of specifying the values of some features, known as cal¬ 
ibration parameters, in a computational model, which are unobservable in the associated 


physical system (Kennedy and O’Hagan, 

2001; 

Han et al. 

2009a 

Higdon et ah, 

2004a 

Tuo and Wu. 

2014 

). Here, a computational model refers to a set of mathematical forrnu- 


lations and assumptions implemented through computer codes, and is intended to behave 
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similar to the physical system, i.e. they both generate the same response for a given input 
variable. If in the physical system, a calibration parameter assumes a range of values as 
a function of other variables in the system, we call this problem functional calibration. 


Functional calibration is motivated by examples in nano-manufacturing (Pourhabib et al. 


2015), resistance spot-welding (Bayarri et al. 2007), and bioenergy production (Kumar 


et al. 2009). For example, in Poly-vinyl Alcohol (PVA)-treated buckypaper fabrication, 


we are interested in understanding the relation between the response, the tensile strength, 
and the input variable, the PVA amount. Here, the calibration parameter is the percentage 
of PVA absorbed, which may change in the physical system depending on the values of the 


input variable, the PVA amount (Pourhabib et al. 2015). 


From a geometric perspective, functional calibration is a curve to surface matching 
problem. Here, all the possible values for the input variable and the response constitute a 
one-dimensional curve. Note that the reason the physical system is denoted by a curve in 
a two-dimensional space is we cannot measure or observe the calibration parameter, also 


referred to as the latent variable (Pourhabib et al., 2015), in the physical process. As such 


what we actually observe only consists of input variables and the responses which form a 
curve in the input-response space. On the other hand, in the computational model, we 
can specify the values of both the input and the calibration parameter. Consequently, all 
the possible values of the variable, calibration parameter, and the computational model 
response together form a surface (see Figure [T]). If the problem is point calibration, i.e., 
there exists one single value for the calibration parameter, then we need to match the 
curve from the physical system to the surface from the computational model, since the 
computational model is intended to represent the physical process. 

We note that the physical curve we observe is in fact a projection of a curve in the 
three dimensional input-parameter-response space. The reason lies behind the nature of 
parameters in a physical process: for each input there exists an (unknown) value for the 
parameter, and these two features, i.e the input and parameter, determine one single 
response. However, since we do not observe the actual value of the parameter in the 
physical process, we only see a projected curve on input-response surface. Hence, this 
problem is to recover the original curve, or in other words, determine a non-isometric 
match of a curve to a surface. The non-isometry is due to the fact that the curve in the 
projected space has a different length than the original curve. Therefore, this is in principle 


different from isometric matching problems (Gruen and Akca, 2005 Bronstein et al., 2005 


Baltsavias et al., 2008) 


Furthermore, in functional calibration we do not observe a complete curve or surface, 
but a set of scattered data points obtained from the physical experiments or the computa¬ 
tional model. The data points from the former lie on the projected curve we observe and 
the data from the latter lie on the input-parameter-response surface. Therefore, what we 
observe is incomplete data and we aim to match non-isometrically an incomplete curve to 
an incomplete surface, which is equivalent to solving the functional calibration problem. 

The incompleteness of the data, that is scattered on a curve or a surface instead of 
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a continuous function, can be addressed by using kernel interpolation 
processes (GPs) (Stein, 1999) 


such as Gaussian 
In fact, to handle functional calibration, both parametric 


and non-parametric approaches (Pourhabib et al. 2015, 2014) use kernel interpolation 


After interpolating the data points to get a continuous surface, the former assumes a 
parametric relationship between the inputs and calibration parameters and then estimates 
the unknown parameters of that function, whereas the latter, the non-parametric approach, 
formulates the problem using the concepts of reproducing kernel Hilbert spaces (RKHS) 
and then finds a solution using the representer’s theorem (Scholkopf et al. 2001). 

However, using interpolation introduces a new source of error: assuming a GP, for 
example, relies on the joint Gaussian distribution which is not always the case in practice. 
Therefore, we argue if we operate on the very data points we will obtain a better matching 
in this case. In addition, the existing methods appeal to iterative numerical techniques 
for optimization which are prone to get trapped in local optima. The interaction between 
these two issues, interpolation and iterative optimization, further exacerbates the quality 
of the optimal solution, as the methods converge to a local optimum of the model based 
on the interpolated surface. 

In this paper, we propose a graph-theoretic approach for non-isometric curve to surface 
matching from incomplete data for functional calibration. We reformulate the functional 
calibration problem as a shortest path problem on a directed acyclic graph (DAG) which 
can be solved efficiently with a fast polynomial-time algorithm. The shortest path found 
identifies anchor points from the computational model points that balance two basic crite¬ 
ria: First, the anchors should offer the best match to the response of the physical system, 
and second, they should facilitate fitting of a smooth function (not exhibit erratic fluc¬ 
tuations in values). The anchor points are then used to fit a response function that can 
then be used to make predictions on the physical process. Applying the overall framework 
developed in this paper on both synthetic and real data, we demonstrate that the new 
approach outperforms the alternative approaches in terms of prediction accuracy. 

The rest of this paper is organized as follows: Section [2] formally defines the problem 
and reviews the relevant literature. Section [3] outlines our proposed approach to solve 
the non-isometric curve to surface matching. Section [4] presents the numerical results and 
makes suggestions regarding parameter selection. Section [5] concludes the paper. 


2 Non-isometric matching 

2.0.1 Problem Definition 

We consider a physical system that operates according to a set of (possibly unknown) 
physical laws. In this system, we assume there is a functional relationship between a group 
of features and an output. The physical system has an input variable denoted by x 6 M. 
Input variables represent those values that can be observed or specified. As such, the 
vector of input variables are also called control variables. We also assume that once in the 
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physical system the control variables are set (either observed or specified), the physical 
process generates a real valued response. However, the response is not merely a function of 
those control variables, but a function of both the control variables and parameters 6 e M. 
The latter denotes those features of the physical systems that are either hard to measure or 
control. Hence, to express this functional relationship we denote the response as y p (x\6). 
Likewise, we denote the output of the computational model by y c (x, 9), although here, 9 is 
also an input argument. An example of such a complex system is PVA-treated buckypaper 
fabrication, where we are interested in finding the relation between the input variable PVA 
amount and the response, namely the buckypaper tensile strength. Here, the percentage 
of PVA absorbed is a calibration parameter, since it cannot be measured in the physical 
system. 

Note that, assuming our computational model is constructed according to the laws 
governing the physical system, in a computational model the same concepts hold: we have 
a “system” whose structure is determined by the interactions between some input/control 
variables and parameters. However, since in a computer code we are not restricted to 
measuring any physical features, we can set the values of both x and 9 arbitrarily. 

Of course, the goal is to design a high-fidelity computational model that operates similar 
to the physical system. To this end, we need to obtain the physical responses {yf ..., ym} 
at a set of physical inputs V = {ij,...,4i}, and the computational model responses 
{yi,---,Vn} at the model inputs Q = {(xf, Of),Of)}, where y? = y p (x^-,9^) and 
yf = y c (xf Of), for j = 1 ,m and i = 1,..., n. In general, due to cost considerations, 
m <C n, i.e., obtaining data points from the computational model is generally cheaper than 
getting data from the physical experiment. As such, we also assume that for each x? G T> 
there exists at least an i € {1, ..., n} such that x\ = xf i.e., for any physical input we have 
at least one model input (xf Of) where the control variable in the computational model is 
the same as that in the physical system. The goal in calibration is to adjust the parameter 
9 , such that the computational model represents the physical system in the sense that the 
computational model can predict the physical response at any input location xf 


We call any method seeking a single value 9 6 M that matches the computational 
model with the physical system, point calibration (this class of methods is simply called 


calibration in the literature; 

see (Tuo and Wu[ 2014 Bayarri et al.| 2007 

Joseph and 

Melkote 

2009; 

Higdon et al. 

2004b; Han et al. 

2009b 

Xiong et al.j, 2009))). On the other 


hand, functional calibration refers to finding a function /, where 9 = f(x), such that the 
output of the computational model matches that of the physical system for any given x, 
i.e. / is a solution to 


min 

/e-F 


{£ (y p (x; 0),y c (x, 9)) \9 = f{x)}, 


(1) 


where J 7 is a space of functions and £(•,•) is a loss function that measures the discrepancy 
between the responses of the physical system and the computational model. What we call 


functional calibration is also referred to as local calibration (Pourhabib et ah, 2014) or 
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latent variable estimation (Pourhabib et al. 2015). 


Functional calibration has an interesting geometric interpretation. Note that y p (x;0) 
as a function of mere x constitutes a one-dimensional curve. On the other hand, y c (x,9 ) 
as a function of both x and 0 is a two-dimensional surface. So, the optimization prob¬ 
lem (jT]) is matching a one-dimensional curve to a two-dimensional surface. However, this 
is different from traditional matching problems (Gruen and Akca, 2005), since the length 
of the one-dimensional curve is not preserved. Specifically, if (x, 0, y p (x; 0)) represents the 
true physical curve, what we observe is its projection (x, 0, y p (x; 9)) E R 3 , which is a one 
dimensional curve. On the other hand, the two-dimensional surface is set of all points 
(x, 9 , y c (x, 9 )) E R 3 . Now, we are trying to find 9 = /(x) such that the curve (x, 9 , y p (x; 9)) 
lies on the surface (x, 0, y c (x, 0)) E R 3 ; however, the length of the curve (x, 9, y p (x; 0 )) 
is larger than that of (x, 0, y p (x; 9)), and consequently the matching procedure is non¬ 
isometric. As such, we note that functional calibration for the case where 9 and x E R is 
non-isometric curve to surface matching. 

In practice we do not have the entire curve or surface, but scattered data points from 
those sets. Recall that we are provided with the physical inputs D = { x \■ : j = 1,... ,m} 
and model inputs Q = {(x?, 9f) : i = 1 ,... ,m} from the computational model. As such, 
we have {(x?, 0, y \),..., (x^, 0, y^)}, and {(xf, , yf),..., (x£, 0£, y£)} sampled from the 

projected curve and the surface, respectively, where yj = y p (a^;/(xp) for j = 1 ,m, 
and yf = y c (x p , 9,) for i = 1,,n. Under this setting, the problem is then non-isometric 
matching of an incomplete curve to an incomplete surface, which we call non-isometric 
matching with incomplete data. Once the function / is estimated, we can predict the 
physical response at any point x£ by y c (x^, /(x*)). See Figure [l] for an illustration. 


2.0.2 Related Work 

The characterization of function / determines the solution approach to problem ([!]). One 
way is to characterize / as an element of a reproducing kernel Hilbert space (RKHS) (Tuo 
and Wu 2014). Assume : R x R —> R denotes a symmetric positive semidefinite kernel, 


i.e., 


&(x,x , )f(x)f(x')dfi(x)dy(x') > 0, 


for any / E L 2 (R r ,/r) where g is a measure ( Rasmussen and Williamsf 2006), which can 
define a space of smooth functions: For a given Xj E R, the function 4>(x,Xj) is a real¬ 
valued function. Then, define the space of all functions constructed as a linear combination 
of 4>(x,Xj) for xj E R, as: 


F*(R) 


QO j 

EW,x,),^ e R,xj e R > 
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Computational Model Surface 
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Data Points from Computational Model 
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Figure 1: Curve to surface matching for a synthetic dataset used in Section 4.0.3 The top 
plot shows the complete surface and curve. The computational model data points lie on a 
two-dimensional surface. The true physical curve needs to be recovered by projecting back 
the observed physical curve to the 3D space. In practice, however, what we observe are 
scattered data points sampled from the complete curve and surface, which is depicted in 
the bottom plot. 
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We can show that the closure of F$(R) with respect to the norm induced by the inner 
product 

OO OO OO OO 

j =1 1= 1 3=1 1= 1 

is an RKHS (Wendland, 2005, FasshaueTf |2011 ). Then, non-parametric functional cali¬ 
bration (NFC) fPourhabib et al. 2014[ ), given a kernel, seeks a function / in that RKHS 
as a solution to a variation of problem ([I]). Specifically, NFC solves 


1 2 
™n-^{y p (x p j ;f(x p j ))-y c (x p j ,f(x p j ))} +7II/IP 


( 2 ) 


3 =1 


where / = /3j$(-,Xj), and ||/|| 2 = YljL 1 ZXi x i), 7 > 0 is a regularization 

parameter. The generalized representer’s theorem (Scholkopf et al. 2001) guarantees that 

m A 

any local optimum f(x) = x P A, i.e. a linear combination of the kernel evaluations 


3 = 1 


at V = {x p ,..., Xm}- Therefore, we need to find j3 such that 


1 m 

& = -f-r - E {y p B; /K)) ~ y c ( xP 3 > /(*? 


where 


j=i 

m m 

+7 ^2^2 PjPeHtf, x p e ), 
j =1 *=1 

m 

f{x P j) = 

£=1 


(3) 


which is a non-convex optimization problem in general, due to the fact that /3 is an argument 
of the non-convex function y c (-, •). 

Note that in optimization problem ([ 3 ]), we have y c (x p , f{x p )) which means we evaluate 
the computational model at V. This poses no issue since we assumed for each x p - G T> 
there exists at least an i G {l,...,n} such that x\ = x p . However, since we solve ([3]) 


using iterative numerical techniques, such as trust region methods (|Byrd et al. 2000; Conn 


et al. 2000), or steepest descent (|Luenberger and Ye[ |2008[), we may have to evaluate 


y c (-, •) at points other than those in Q. Therefore, we need to create a surrogate model 
that interpolates the computational model. Specifically, let ^(x, 0) denote a real valued 
function defined on M 2 , where y°(xf, Of) = y c (x£,0?), for every (xf,0f) G Q. i.e. •) 
interpolates y c (-,-)- Furthermore, if at any point (x,9) G M 2 \£/, y c (-,-) provides a “good 
guess” of what y c (-,-) would be, we call y c (-,-) a surrogate model for y c (-,-). A popular 
choice for the surrogate model is based on GP regression (Stein 1999| Rasmussen and 


Williams, 2006). 
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An alternative to using a non-parametric form for /, namely a function in the RKHS, 
is to assume a parametric form. For example, guided by the physics of the buckypaper 
fabrication process (Pourhabib et al. 2015), we can assume 0 = f(x) = a exp— fix. Then, 


the functional calibration problem seeks (a *, /3 *) such that 


m 

(«*,/?*) = argrnin ^ a exp{-/3x P j}) 

j=1 


f(^,aex p{-/3®?}) , 


(4) 


where we use y °(-, ■) again as a surrogate model as solving 0 requires appealing to 
iterative numerical techniques that need to evaluate the computational model at points not 
necessarily in Q. We call this approach parametric functional calibration (PFC) (which is 
also referred to as latent variable estimation in (Pourhabib et ah, 2015)). 

PFC is well suited for the buckypaper fabrication problem, however, for most appli¬ 
cations we cannot assume a parametric form for the calibration function / a priori. In 
addition, using a surrogate model introduces a new source of error, since we are interpolat¬ 
ing the computational model surface. We cannot overcome the issue of using a surrogate 
model in PFC, or even NFC, since they are both solved using iterative numerical techniques 
that require evaluating the objective function at arbitrary points on its domain. Note that 
once we use a surrogate model in the optimization procedure, the point identified as opti¬ 
mal may be far way from the actual optimal point. In the following section, we propose an 
alternative framework in which we employ the computational model data points directly 
to determine the actual physical curve. 


(Pourhabib et ah, 2015 


3 Graph Shortest Path for Non-isometric Matching 


The intuition that the functional calibration problem is non-isometric curve to surface 
matching, and that we are provided with scatter data points from the curve and the 
surface, motivates us to view the problem through a discrete combinatorial lens. In this 
approach, we model functional calibration as a combinatorial optimization problem on a 
graph. 


Unlike NFC and PFC discussed in Section 2.0.2 we do not identify a class of functions 
a priori. Instead we seek to identify anchor points, among computational model points, 
which, intuitively speaking, are “close” to the points on the true physical curve. In other 
words, the anchor points are positioned such that the true physical curve passes through 
neighborhoods of those points. To find these anchor points, we require two properties to 
hold: (i) the computational model response should be close to the physical response for 
a given input x: and (ii) the parameter values for two consecutive anchor points should 
be close to each other. The former ensures the computational model identifies the points 
that have the same response as that of the physical system, and the latter ensures the 
smoothness of the estimated physical curve. Next, we show how we translate this idea and 
the properties into a rigorous graph-theoretic framework. 










Without loss of generality, we assume that both the physical inputs V = { x \• : j = 
1 , ...,m} and model inputs £/ = {(x?, 0?) : i = l,...,n} are strictly ordered such that 
x^ < x^ +1 , for j = 1,..., m — 1, and x\ < x? +1 , for i = 1,..., n — 1. Then, we construct a 
directed, edge-weighted graph, G = ( V ., E) where 1/ = F° U {0, n + 1}, and the vertices in 
V° = {1,..., n} correspond to the model inputs in Q. 

We partition the V° into m clusters {Ci,...,C m } as follows: Consider any vertex 
i G V° corresponding to (x£,0£) 6 £/. Then, i is assigned to a unique cluster C-/ for some 
j G {1,.. •, to} as follows: 

i G Cj if j = argmin{|x^ — x£\ : i = 1,... ,m}. (5) 

Note that, if the minimum in ([5]) is achieved by more than one j G {1,..., to}, then the tie 
can be broken arbitrarily. As a consequence, each cluster Cj is in 1-to-l correspondence 
with the physical input x p . We can now describe the set of directed edges E as follows: 

E = [J {(u,u) : u G Cj,v G C^+i} |J{(0, rx) : u G Ci}(J{(u,n+ 1) : u G C m } (6) 

j=l,...,m—l 

The first term corresponds to all the directed edges that go from vertices in Cj to vertices 
in Cj. )-i. The second term corresponds to the directed edges from the artificial vertex 0 to 
each vertex in C\. The third term corresponds to directed edges from vertices in C m to 
the artificial vertex n + 1. 

The final critical step is assigning weights w uv to each edge (u, v) G E. Consider two 
consecutive clusters Cj and Cj + \ for any j = 1,... ,m and let u G Cj and v G C ]+ \. Define 
w uv as follows. 

Wuv = \y c u ~ y p u \ + A0 C U ~ 0 C V \, (7) 

where A > 0 is a scaling parameter. The weights wo tU \/u G C\ and w u>n+ i\/u G C m 
are identically zero. Notice that the edge-weight for any edge between two consecutive 
clusters j and j + 1 consists of two parts: the first representing the difference between the 
model response and physical response; the second part represents the difference between 
the calibration parameters of j and j + 1. 

On this graph G, we intend to solve the shortest (simple, directed) path problem from 
origin vertex 0 to destination vertex n + 1. Every path from 0 to n + 1 in G has exactly 
to T 1 arcs, by construction. Therefore, the to internal vertices on the shortest path that is 
found will serve as the anchor points for fitting the function. See Figure [2] for illustration. 

Observe that, G = {V,E) as described here is a directed acyclic graph (DAG). Further¬ 
more, we claim that 0,1,..., n, n + 1 is a topological ordering of vertices of G. To establish 
this claim it suffices to show that if ( u , v ) G E, then u < v. Note that u < v x£ < x£ 

as we have assumed the inputs to be strictly ordered. Suppose u G Cj and v G Cj +1 
where j G V° \ {to}. Hence, x p < x p +1 , j = argmin{|x^ — x£| : t = l,...,m} and 
j + 1 = argmin{|x^ — x{,| : £ = 1 ,... ,m}. Since, x p ,x^ : Xy and x p +l are points on a real 
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Figure 2: Graph representation of the functional calibration problem for the case where 
m = 5 and n = 13. Crosses represent data points from the computational model and solid 
circles show the physical data points which correspond to clusters C\ through C m . The 
darker crosses represent the anchor points, and the solid arrows identify the edges in the 
shortest path. 


line, we must have x £ < x c v . Therefore, we can solve the shortest path problem in 0(\E\) 
complexity using the reaching algorithm for DAGs (Lawler 1976) (see Algorithm [I]) . 


Algorithm 1 Reaching algorithm for DAGs 

1 

dist( 0) := 0 ,dist(u) := 

oo V u G P\{0} 

2 

pred( 0) := 0 ,pred(u) : 

= —IV «£ V \ {0} 

3 

for u = 0,..., 

n do 


4 

for each (u, 

v) € E 

do 

5 

if dist{v) 

> dist(u) + w uv then 

6 

dist(v ) 

:= dist{ 

u) + w uv 

7 

pred(v) 

:= u 


8 

end if 



9 

end for 



10 

end for 




Suppose 0 — vi — Vi — ■ ■ ■ — v m — (n + 1) is the shortest path identified. The points 
• j (x£ ,)} serve as the anchor points for fitting the function. Note that 
we are only interested in identifying these “optimal” anchor points that provide us with 
the information to estimate the “best” function / (within this discrete optimization frame¬ 
work). 

Given the anchor points, one approach is to fit a piecewise-linear function with the 
anchor points as break-points, or identify linear pieces via regression within clusters, with 
the additional constraint that the linear pieces coincide at the boundary resulting in a 
continuous piecewise-linear function. Alternately, we can use GP regression to interpolate 
between these points. Based on our experiments, the latter, i.e., GP regression, provided a 
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better solution in terms of predictive power. Specifically, we fit a GP on the anchor points 

{«!> 0Sl)> ■ • • > where {®5l> ■ • • > } are the indeX Set and i e v l» ■ • ■ are 

noise-contaminated output of the GP /, i.e. = f(x%) + €j, for j = 1 where 

ej ~ jV(0, a 2 ) and a 2 > 0. 

We call this approach graph non-isometric matching (GNM), which characterizes T 
in optimization © as the space of smooth GP posterior functions. The proximity of the 
anchor points to the physical responses is emphasized by our construction (clustering and 
edge-weights), enables a good match between the physical system and the computational 
model. 


4 Experiments 


This section presents the numerical results for the proposed method, GNM, and compares 
its performance with two competing methods, PFC and NFC. We use one real and two syn¬ 
thetic datasets. The measure of performance is prediction accuracy; after using a training 
dataset we select a set of test data points, T = {x^, x* 2 ,... x* nt j and then for each method we 
find the predicted response values {yi, y 2 > • • • Vn t }- We compare the predicted values with 
the actual physical responses at those points, i.e. {y p (x\] Q\), y p (x?i; 0%), ■ ■ ■ y p (x* t ; 0* t )}. 
Then, we compute the root mean squared error (RMSE) 

nt 

RMSE = ^2(y i -yP(x*;e*)f, (8) 

2=1 


for each method. A method with a smaller RMSE is deemed superior. 

To create training and test cases we follow two approaches: we can select the test data 
set T to be the set of inputs in the training data set, i.e. T = {x 2 , ..., Xm}- The second 

approach is to use 4-fold cross validation (Hastie et ah, 2001). We call the error found 
through the latter the test error and the former the training error. 

For synthetic datasets, since we know the actual functional relationship between the 
calibration parameter and the input variable, we can also calculate the RMSE for the 
calibration parameter prediction. Specifically, 


nt 


RMSEfl = \ 6i ~ 9 i 


(9) 


2=1 


where 6i is the estimated parameter at the input x*, i.e., = f(x*), where /(•) is the 

estimated calibration function, and 9* is the actual parameter. 


4.0.3 Synthetic Data 


We use two synthetic datasets, initially designed to test NFC (Pourhabib et ah, 2014) 


however, we use different ranges for the input variables and slightly modify the way the 
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input locations are selected as explained. First, we consider a physical system with 
a response function y p (x]9) = exp(^j) sin(x) and an associated computational model 
y s (x,9) = exp(^j) sin(x)^. We note that when 9 = .5x the computational model matches 
the physical response. As such, the true functional relationship is given by f(x) = 0.5x. 
We name this dataset SD1. The second synthetic dataset we consider displays a more com¬ 
plicated physical response and a nonlinear relationship between the calibration parameter 
and control variables. Specifically, the physical system follows y p (x;9) = cos(2x) sin(|) 
and the associated computational model surface y s (x,9 ) = cos(2x) sin(|) sin( 2 ^ a ._^ 2+2 ) 
2nd ) exp( 2 , i 2 — |). Note that if the parameter 9 = (x — 2) 2 + 1 the compu- 


cos 


(x—2) 2 +l 1 


2(ai-2) 2 +2 2 > 

tational model matches the physical system. We name this dataset SD2. We use two-fold 
cross validation to select the regularization parameter A, which suggests A = 0.4 for SD1 
and A = 0.3 for SD2. Figure [l] illustrates the problem for SD2. 

To calculate the training error, we use m = 15 physical data points and n = 450 data 
points for the computational model. To calculate the test error, we use m = 30 physical 
data points and n = 900 points from the computational model. For SD1 the physical data 
points are uniformly sampled from [9|, and the computational model data points are 
sampled from [9|, x [j, 3|]. For SD2 the physical data points are sampled from [3|, 7r], 
and the computational model data points are sampled from [3|,7t] x [|,7t]. 

Table [l] shows the results for both SD1 and SD2. For each training or test case, we 
present the results for response prediction, measured by RMSE, and parameter prediction, 
measured by RMSEg. For test cases, the numbers in parentheses are the standard devia¬ 
tions of the four folds. The table shows that GNM outperforms NFC and PFC for all the 
comparison cases. In the following, we analyze some of the results in more detail. 

Comparing training cases of SD1 with SD2, we observe that the performance of PFC 
deteriorates on the dataset with a quadratic relationship between the control variables and 
the parameters, whereas for GNM and NFC the nonlinearity of function / does not affect 
the performance since neither fixes a parametric functional from a priori. An interesting 
observation for the test errors in SD1 is while NFC has a relatively small RMSE, its RMSE# 
is significantly worse. A closer look at the results revealed that the large value for RMSE# 
can be attributed to two reasons. First, for the test cases each method is provided with 
a smaller number of data points. For NFC, this resulted in the algorithm finding a local 
optimum in the region which has a similar response surface as the training region, but 
different values of the parameters. This occurs due to the fact that NFC uses function 
approximation for the computational model, thereby it assumes the function following 
similar trends over a large region. Note that this problem does not occur for all the 
possible ranges for the inputs, like the ranges used in (Pourhabib et ah, 2014) for building 
synthetic data. In fact, we realized this issue while trying different ranges for the data, and 
then used the aforementioned range in this paper to demonstrate this point. This can be a 
major drawback for some applications in which not only a good prediction for the response 
is desired, but also the relationship between the control variables and the parameters is 
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Table 1: Comparing NFC, PFC and GNM on synthetic datasets. 


SD1 

Method 

Training 

Test 

RMSE 

RMSE# 

RMSE 

RMSEfl 

NFC 

0.267 

0.454 

0.180(0.134) 

4.810(0.212) 

PFC 

0.220 

0.303 

0.351(0.131) 

0.642(0.552) 

GNM 

0.029 

0.056 

0.031(0.017) 

0.065(0.033) 

SD2 

Method 

Training 

Test 

Method 

RMSE 

RMSEe 

RMSE 

RMSE e 

NFC 

0.175 

0.173 

0.290(0.126) 

0.251(0.044) 

PFC 

0.380 

0.329 

0.571(0.0593) 

0.817(0.452) 

GNM 

0.017 

0.027 

0.043(0.022) 

0.046(0.012) 


sought (Pourhabib et al., 2015). 

In summary, the significantly lower RMSE and RMSE# for GNM, compared to NFC 
and PFC can be explained by noting that GNM does not model the computational model as 
a continuous surface. In addition, the way the graph G is constructed adequately describes 
the non-isometric matching nature of this problem. 


4.0.4 Real Data 

We use a real dataset from fabrication of buckypaper which is a nano-manufactured prod¬ 
uct composed of thin sheets of carbon-nano tubes (Tsai et al. 2011). An approach to 


enhance the mechanical properties of buckypaper involves adding PVA to the fabrication 
process. A computational model is then used to model the relationship between the output, 


measured by Young’s modulus of the buckypaper, and the input, the PVA amount (Wang 


2013). Here, the calibration parameter is the percentage of the PVA absorbed in the pro¬ 
cess. Based on the understanding of the physics of the problem, this parameter is related, 


through a functional relationship, to the input variable, the PVA amount (Pourhabib et al. 


2014). As such, this problem falls in the category of functional calibration or, based on the 


framework we proposed in this paper, non-isometric matching. 

We use m = 11 physical data points, where the input variables are uniformly distributed 
between 0.5 and 1. The number of computational model data points is 150. Since we do not 
know the true functional relationship between the parameter and the input, we can only 
evaluate the methods by calculating their response prediction. Table [2] shows the results 
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using both training error and 4-fold cross validation. For this dataset, NFC provides the 
best result for the training error. For the test case, GNM provides the best result. This 
demonstrates when we have a smaller number of training physical data points, the graph- 
theoretic approach is more reliable than both NFC and PFC. 

Table 2: Comparing NFC, PFC and GNM on buckypaper data. 


Buckypaper 


Training 

Test 

Method 

RMSE 

RMSE 

NFC 

0.121 

0.301(0.248) 

PFC 

0.291 

0.609(0.441) 

GNM 

0.276 

0.285(0.164) 


4.1 Selection of regularization parameter 


Here we discuss the impact of the numerical value of the regularization parameter A in 
equation Q. If A = 0 the weight for the edges in the graph is solely decided by the 
difference in the responses of the physical data points and the simulated data points in 
their neighborhood. This may result in a large difference between the second values of 
the parameters of the two consecutive points selected in model Q. On the other hand, 
if A is a very large number, the selected computational model points, which are used to 
estimate the true physical curve, may be very close to each other and the estimated curve 
may deviate from the actual physical curve. In our analysis we use 2-fold cross validation 
to select the parameter A. Based on our analysis, this approach identifies a good value 
for the regularization parameter. To test this, Figure [3] displays the test error for the two 
synthetic datasets as a function of A. The test error is calculated by randomly partitioning 
each dataset into 75% of training points and 25% for testing. The figure suggests that the 
value of the error initially reduces as a function of A then it sharply increases. For SD2, 
due to shape of the computational model surface, even a very small number close to zero 


provides a small prediction error. The values for A we obtained in Section 4.0.3 using cross 
validation error, which were 0.4 and 0.3 for SD1 and SD2, respectively, are very close to 
the optimal values in this figure. 


5 Conclusion 

In this paper, we developed a novel graph-theoretic approach for non-isometric curve to 
surface matching from incomplete data for functional calibration. The motivating example 
is the PVA-treated buckypaper fabrication process in which the incomplete curve is the set 
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Figure 3: Prediction error as a function of A for synthetic datasets. 


of physical data points and the incomplete surface is the set of computational model data 
points. In this framework, the functional calibration problem was recast as a shortest path 
problem on a directed acyclic graph. The solution to the shortest path problem is used to 
identify anchor points that are then used to fit a response function. On both synthetic and 
real data, we demonstrated that our new approach outperforms competing approaches in 
terms of prediction accuracy. 

An interesting question, which is of importance in many manufacturing problems, is how 
to generalize GNM to match low dimensional surfaces to higher dimensional surfaces. This 
problem arises when the physical process has more than one input, such as the welding 
example discussed in (Bayarri et ah, 2007). A major challenge in this case is that the 
points do not poses an inherent ordering like in the one-dimensional case. Therefore this 
generalization is not a straightforward task and will constitute our future research. 
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